-
Notifications
You must be signed in to change notification settings - Fork 25
ci: add scheduled benchmark runs (#1639) #1645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please rebase this PR on top of #1642 (or whatever PR it's dependent on)? I'm not what's needed to be reviewed vs code that's being reviewed in a different PR.
| summary-always: true | ||
| auto-push: true | ||
| fail-on-alert: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: what's the rationale for setting summary-always and fail-on-alert to true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are about visibility and catching regressions early. summary-always ensures results are always visible in the job summary. fail-on-alert ensures we don't silently regress - if performance drops significantly, the workflow fails rather than quietly recording bad data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if performance drops significantly
Hmm what does it mean for performance to drop significantly?
rkuris
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Overall, this seems like a lot of somewhat-fragile code to parse json instead of just specifying the parameters in the normal github style.
I think we can save a ton of money by picking a much smaller instance size. I think i4i.xlarge will work at roughly 1/8 the cost.
.github/benchmark-schedules.json
Outdated
| { | ||
| "name": "daily-40m-41m", | ||
| "cron": "0 5 * * *", | ||
| "runner": "avago-runner-i4i-4xlarge-local-ssd", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This instance type is likely wasting a ton of space and is double the cost of i4i-2xlarge and 4x the cost of i4i-xlarge. The latter has 937G of local disk. Let's lower the cost of this by using smaller instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Infra team opened Jira ticket for this, we can track it here: https://ava-labs.atlassian.net/browse/INFRA-6420
| description: 'Firewood commit/branch/tag to test (leave empty to use the commit that triggered the workflow)' | ||
| default: '' | ||
| libevm: | ||
| description: 'libevm commit/branch/tag to test (leave empty to skip)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does 'skip' mean here? Does that mean use main/master?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip means using libevm defined in AvalancheGo, I updated description.
| default: '' | ||
| config: | ||
| description: 'Config (e.g., firewood, hashdb)' | ||
| default: '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 'firewood'? What happens if none is specified in the json?
Why do we even need a default here? Same with many of the others. Is it required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default: '' isn't strictly required omitting it has the same behavior (empty string when not provided). We use it for consistency with AvalancheGo's pattern.
Yes you're right in this context default value for config should be Firewood.
| const fs = require('fs'); | ||
| let matrix; | ||
|
|
||
| if (context.eventName === 'schedule') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true when fired from the scheduled cron job and false otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, context.eventName returns the trigger type:
schedulewhen fired by cronworkflow_dispatchwhen triggered manuallypush,pull_request, etc. for other triggers
Implements workflow to trigger C-Chain reexecution benchmarks in AvalancheGo and track Firewood performance over time. Supports task-based and custom parameter modes. Results stored in GitHub Pages via github-action-benchmark. # Conflicts: # .github/workflows/track-performance.yml
- Refactor `bench-cchain` to trigger track-performance.yml instead of
directly calling AvalancheGo's workflow
- Add input validation and helpful error messages for test/custom params
- Use commit SHA instead of branch name for reproducibility
- Fix AvalancheGo workflow input: use `with-dependencies` format
("firewood=abc,libevm=xyz") instead of separate firewood-ref/libevm-ref
- Remove status-cchain, list-cchain, help-cchain (use GitHub UI instead)
- Remove debug logging from track-performance.yml
Remove local developer tooling (justfile recipe, flake.nix, METRICS.md) to reduce PR scope. These will be submitted in a follow-up PR after the CI workflow changes are merged.
Daily (40M→41M) and weekly (50M→60M) benchmarks with JSON-based config and matrix strategy for cleaner workflow management.
2af94f6 to
6ee9ccc
Compare
…coded branch name used for temp. testing
- Replace `avago-runner-i4i-4xlarge-local-ssd` with `avago-runner-i4i-2xlarge-local-ssd` for daily and weekly benchmarks. - Add new input options for `firewood` and `libevm` commits/branches/tags. - Extend runner choices with additional configurations in the workflow.
- Remove `.github/benchmark-schedules.json` in favor of inline configurations. - Replace matrix-based strategy with direct output handling for cleaner and more explicit workflow logic. - Preserve manual and scheduled benchmark support with optimized input handling.
Updated was trying to stay consistent with AvalancheGo's JSON config approach but I agree with you inline with native GitHub outputs feels more natural here. No strong need for consistency on this at this point in time. |
|
@rkuris while testing the scheduled benchmark workflow, I uncovered a couple of issues:
When it's time to cut the next release, this will naturally resolve. Alternatively, we could cut a quick Can the team prepare a |
…ale workspace issues
|
Test case manual dispatch with custom dependencies Command: Run: https://github.com/ava-labs/firewood/actions/runs/21489879261
|
- Add workflow_dispatch trigger to gh-pages.yaml for manual runs - Trigger gh-pages deployment from track-performance.yml after benchmark results are pushed to benchmark-data branch This ensures benchmark results are immediately visible on GitHub Pages without waiting for a push to main.
Why this should be merged
This patch is closing #1639 it adds automated benchmark triggers to catch performance regressions without manual intervention.
Note: should be merged after #1493
How this works
Manual dispatch still works via workflow inputs.
How this was tested